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METHODS, SYSTEMS AND COMPUTER PROGRAMS FOR 
DECONVOLVING THE SPECTRAL CONTRIBUTION OF CHEMICAL 
CONSTITUENTS WITH OVERLAPPING SIGNALS 

Related Application 

This application claims priority to U.S. Provisional Application Serial No. 
60/421,177, filed October 25, 2002, the contents of which are hereby incorporated by 
reference as if recited in full herein. 

5 

Field of the Invention 

The present invention relates generally to analysis methods of samples that 
produce signals with constituents having overlapping signals. The invention may be 
particvilarly suitable for NMR analysis of signals associated with biosamples, such as 
10 signals of lipoprotein constituents in blood plasma and serum. 

Background of the Invention 
In the past, several analysis methods have been used to evaluate target 
materials, such as biosamples, that can generate a plurality of signals, each associated 

1 5 with a respective individual constituent in a complex mixture comprising a plurality 
of constituents, some of which may have overlapping signal contributions. The 
overlapping signals can produce a composite spectrum that can be deconvolved to 
estimate or determine the amount and/or presence of selected individual or subsets of 
constituents in the complex mixture. Such analysis methods include, but are not 

20 limited to spectroscopy, chromatography, and the like. 

In the past, NMR spectroscopic evaluations of in vitro biosamples have been 
used to identify the presence of and/or measure the concentration or amounts of 
selected constituents in a complex mixture, the constituents having associated 
chemical lineshapes and/or peaks in the obtained NMR signal. For example, U.S. 

25 Patent No. 4,933,844, entitled Measurement of Blood Lipoprotein Constituents by 

Analysis of Data Acquired from an NMR Spectrometer to Otvos and U.S. Patent No. 

5,343,389, entitled Method and Apparatus for Measuring Classes and Subclasses of 
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Lipoproteins^ also to Otvos, describe NMR evaluation techniques that concurrently 
obteiin and then measure a plurality of different lipoprotein constituents in an in vitro 
blood or plasma sample. See also, U.S. Patent Application Serial No. 10/208,371, 
entitled Method Of Determining Presence And Concentration Of Lipoprotein XIn 
5 Blood Plasma And Serum, the contents of the above patents and patent application are 
hereby incorporated by reference as if recited in full hereia 

Generally described, to evaluate the lipoproteins in a blood plasma and/or 
serum sample, the amplitudes and/or lineshapes of a plurality of NMR spectroscopy 
derived signals within a chemical shift region of the NMR spectrum are deconvoluted 

1 0 from the composite signal or spectrum, each signal component so deconvolved being 
associated with respective lipoprotein subclass constituents of interest or selected 
related groupings of subclass constituents of interest in the sample. The values of at 
least one selected subclass constituent (or groupings of selected subclass constituents) 
are compared to predetermined test criteria to evaluate a patient's risk of having or 

15 developing coronary artery or heart disease. Similarly, NMR spectroscopy evaluation 
of lipoproteins have been proposed to evaluate a patient's risk of having or 
developing insulin resistance, Type-2 diabetes, or related disorders. See, U.S. Patent 
Application Serial No. 09/550,359, entitled Methods and Computer Program 
Products for Determining Risk of Developing Type 2 Diabetes and Other Insulin 

20 Resistance Related Disorders, the contents of which are hereby incorporated by 
reference as if recited in full herein. 

Referring to Figure 1, it is noted that the constituents of certain subclasses of 
lipoproteins have overlapping signals. For example, low-density lipoprotein ("LDL") 
constituent values, shown for clarity as only two (L2 and L5) LDL subclass 

25 constituent values, when presented on a spectrum graph of signal intensity versus 
ppm, can overlap considerably. The overlapping nature of the signals produces a 
regression matrix that is nearly singular. Unfortunately, in conventional statistical 
evaluation methods that can employ non-negative least squares techniques on nearly 
coUinear data, the regression coefficients may be unstable and, hence, variable. See 

30 Myers, Raymond H., Classical and Modern Regression with Applications, (2d ed., 

Mass. PWS-Kent, 1990); Box et al.. Statistics for Experimenters; An Introduction to 

Design, Data Analysis, and Model Building, (New York, Wiley, 1978), The potential 

instability in the regression coefficients can force the non-negative least squares 

analysis to set certain constituent coefficients to zero, although these constituents may 
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be more correctly identified as small positive values when analyzed properly. 
Further, although conventional methods are thought to be adequate for many clinical 
or other applications, particularly in light of the margin of error introduced by other 
testing methodologies, the instability may impede the statistical robustness or 
5 reproducibility of certain measurement results. 

In view of the foregoing, there remains a need to provide improved 
deconvolution methods. 

Summary of Embodiments of the Invention 

10 Certain embodiments of the present invention are directed at providing 

methods, systems, and computer program products that include operations that can 
provide increased robustness or stability in the fitting models, regression analysis 
protocols, and/or reproducibility in the measurements or evaluation of signals having 
spectral contribution from chemical constituents having overlapping signals. 

15 The inventive methods, systems and computer program products may be able 

to interrogate the signal data of selected constituents in a composite spectrum signal 
of an unknown sample to identify the spectra or portion of the composite spectrum 
that is attributed to noise or non-relevant constituents or contributors rather than 
selected constituents of interest and thereby selectively discount or disregard the noise 

20 or constituents of non-relevance from the analysis. A constituent may be non-relevant 
if it is not a target of the analysis and, hence, not of primary or major interest in the 
end computation or output; however, the non-relevant constituent can contribute to 
the signal in a region of interest in the obtained composite data spectrum. Thus, in 
certain embodiments, operations contemplated by the present invention can consider 

25 the influence of these non-relevant contributors in the analysis to more properly eissess 
or estimate the 'true" values of the relevant constituents that increase the accuracy 
and/or reproducibility of the estimate of the relevant or target constituents of interest. 

The evaluation may be particularly suitable for composite signals having at 
least about 10 individual constituents with highly or closely correlated signals. In 

30 certain embodiments, the evaluation can be performed on samples that have between 
30-45, typically at least about 37 individual constituents. In certain embodiments, 
operations are carried out to generate the weighting coefficients without relying on 
negative values in the least squares solution set. Thus, the operations may decrease 
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the number of constituents in the composite that have coefficients that are set to zero 
(previously associated with constituents identified as having negative signal values). 

In certain embodiments, the operations can be carried out to more reliably 
measure discrete spectra contributions of chemical constituents v^th improved signal 
5 resolution without suppressing data for constituents of interest that previously may 
have been identified as having negative values. The evaluation can allow for 
improved resolution in assessing constituent values when deconvolving a complex 
mixture of constituents to measure or identify selected constituents within a 
composite spectrum having signal contributions associated with spatially overlapping 

10 individual constituent signals. 

Methods for determining the presence of and/or a measurement for a plurality 
of constituents in a composite signal extending about a spectrum of interest obtained 
firom a target sample undergoing analysis include: (a) generating a mathematical 
design matrix of constituent data comprising a plurality of selected individual 

1 5 mathematical constituent matrix data sets, each constituent matrix data set including 
constituent amplitude values of a respective spectrum lineshape of a selected 
independent parameter over a desired number of data points of a known reference 
sample that is generated by a predetermined analysis method; (b) generating a 
composite mathematical matrix comprising a data set of amplitude values of a 

20 composite spectrum lineshape of the selected independent parameter over the desired 
nimiber of data points for a target sample undergoing analysis that is generated by the 
predetermined analysis method, the composite lineshape comprising spectral 
contributions fi*om a plurality of the selected individual constituents included in the 
design matrix; (c) rotating the design matrix to provide a rotated design matrix of 

25 principal components; (d) selectively excluding data corresponding to certain of the 
principal components in the rotated design matrix; (e) generating a reduced design 
matrix based on the steps of rotating and excluding; and (f) computing regression fit 
weighting coefGcients based on data in the reduced design matrix and the composite 
matrix for the plurality of individual constituents to determine the presence of and/or 

30 me£isurement of the selected constituents in the target sample. 

The computing step can also employ a sequential regression to ensure non- 
negativity of the weighting coefficients. Stated differently, the computing step can 
include a sequential least squares restraint in a statistical regression analysis to force 

the defined weighting coefficients of target constituents of interest to be positive. 
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The constituent matrix can include at least 10 different data sets, each 
representing a respective one of at least 10 different closely correlated chemical 
constituents such that a plurality of the constituents have overlapping signal 
lineshapes in a region of the spectrum analyzed. 
5 In certain embodiments, the predetermined analysis method is NMR 

spectroscopy and the composite and reference constituent signals represent intensity 
over a desired interval or region in a chemical shift spectrum such that intensity is the 
independent variable parameter. 

The signals (sample and reference) may be spectrally aligned to a reference 

10 signal. For example, for NMR analysis of plasma samples, the signal can be 

referenced to the sharp NMR resonance peak produced by the calcium complex of 
EDTA that is present in the sample. The sample spectrum and the reference spectra 
can be shifted as needed to align the CaEDTA peak at 2.5 1 9 ppm on the horizontal 
scale. In addition, in certain embodiments, operations of the present invention can be 

1 5 carried out using alternative internal references used for signal alignment. For 

example, glucose or lactate signals or other desired constituent references can be used 
for spectral alignment purposes. 

Other aspects of the invention are directed to computer program products for 
deconvolving the spectral contribution of a plurality of closely correlated constituents 

20 in a composite signal. The computer program product includes a computer readable 
storage medium having computer readable program code embodied in the medium. 
The computer-readable program code includes: (a) computer readable program code 
that generates a design matrix of individual selected constituent data sets for a 
plurality of different selected constituents in a spectrum of interest, each individual 

25 selected constituent data set including amplitude values of its associated spectral 
lineshape, wherein a plurality of the different selected constituents are closely 
correlated with overlapping signal lineshapes in the spectrum of interest; (b) computer 
readable program code that obtains a composite signal of a target sample xindergoing 
analysis and generates a composite matrix of amplitude values of the lineshape of the 

30 composite signal in the spectrum of interest, the target sample comprising spectra 
from a plurality of the selected closely correlated constituents that contribute to the 
composite signal; (c) computer readable program code for rotating the design matrix; 
(d) computer readable program code that generates a reduced design matrix; and 
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(e) computer readable program code that computes regression fit weighting 
coefficients based on the design matrix, the reduced matrix, and the composite matrix 
to thereby deconvolve the spectral contribution of at least one non-target variable 
across the spectrum of interest in the composite signal. 
5 Still other aspects of the present invention are directed toward methods of 

deconvolving a complex signal to evaluate an in vitro biosample. The methods 
include: (a) obtaining a plur£dity of individual NMR spectrum reference signals of 
selected target constituents of interest in an in vitro biosample; (b) obtaining a 
composite NMR spectrum signal of the in vitro biosample taken from a subject for 

1 0 analysis, the composite signal including spectral contributions from a plurality of the 
individual target constituents of interest; (c) generating a design matrix of individual 
data sets of the amplitude of the respective reference constituents in the NMR 
spectrum, the design matrix having colimms or rows of data that correspond to 
principal components that contribute to the spectral lineshape of the composite signal; 

1 5 (d) rotating the design matrix; (e) generating a reduced design matrix of principal 
component data by selectively excluding principal components that do not improve 
the estimation of the target constituents in the composite signal; (f) deriving 
regression fit weighting coefficients for the selected target constituents in the 
composite signal; (g) generating a calculated composite lineshape for the sample, the 

20 calculated lineshape being calculated based on the derived weighting coefficients of 
respective constituent reference spectrums of constituents potentially present in the 
sample, and (h) determining the presence or absence of and/or the level or 
concentration of at least one selected constituent in the sample. 

Additional aspects of the present invention include apparatus for measuring 

25 lipoprotein constituents in a subject. The apparatus includes: (a) an NMR 

spectrometer for acquiring an NMR composite spectrum of a blood plasma or serum 
sample; (b) computer program code defining a plurality of individual NMR 
constituent spectrums, each associated with a selected reference lipoprotein 
constituent signal lineshape, each constituent spectrum having associated spectra that 

30 contribute to the composite NMR spectrum of the blood plasma or serum sample; (c) 

computer program means for generating a design matrix of the selected individual 

constituents, the design matrix including data sets for each of the plurality of 

individual lipoprotein constituents in a spectrum of interest, each individual selected 

constituent data set including amplitude values of its associated spectral lineshape, 
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wherein a plurality of the selected individual constituents are closely correlated with 
overlapping signal lineshapes in the spectrum of interest; (d) computer program 
means that obtains a composite signal of a target sample imdergoing analysis and 
generates a composite matrix of amplitude values of the lineshape of the composite 
5 signal in the spectrum of interest, the target sample comprising spectra from a 

plurality of the selected individual constituents that contribute to the composite signal; 
(e) computer program means for rotating the design matrix; (f) computer program 
means for generating a reduced design matrix; (g) computer program means for 
computing regression fit weighting coefficients based on the design matrix, the 

10 reduced matrix, and the composite matrix to deconvolve the spectral contribution of at 
least one non-target variable across the spectrum of interest in the composite signal; 
(h) computer program means for applying a sequential least squares analysis to the 
regression fit weighting coefficients to restrain negative coefficients to zero; (i) 
computer program means for determining a calculated composite lineshape based on 

15 the weighting coefficients; and (j) computer program means for determining the 
concentrations of the lipoprotein constituents in the sample undergoing analysis. 

Other aspects include apparatus for determining the presence of and/or a 
measurement for a plurality of constituents in a composite signal extending about a 
spectrum of interest obtained from a target sample undergoing analysis. The 

20 apparatus includes: (a) means or generating a mathematical design matrix of 

constituent data comprising a plurality of selected individual mathematical constituent 
matrix data sets, each constituent matrix data set including constituent amplitude 
values of a respective spectrum lineshape of a selected independent parameter over a 
desired number of data points of a known reference sample that is generated by a 

25 predetermined analysis method; (b) means for generating a composite mathematical 
matrix comprising a data set of amplitude values of a composite spectrum lineshape 
of the selected independent parameter over the desired number of data points for a 
target sample undergoing analysis that is generated by the predetermined analysis 
method, the composite lineshape comprising spectral contributions from a plurality of 

30 the selected individual constituents included in the design matrix; (c) means for 

rotating the design matrix to provide a rotated design matrix of principal components; 

(d) means for selectively excluding data corresponding to certain of the principal 

components in the rotated design matrix; (e) means for generating a reduced design 

matrix based on the steps of rotating and excluding; and (f) means for computing 
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regression fit weighting coef35cients based on data in the reduced design matrix and 
the composite matrix for the plurality of individual constituents to determine the 
presence of and/or measurement of the selected constituents in the target sample. 

As will be appreciated by those of skill in the art in light of the present 
5 disclosure, embodiments of the present invention may include methods, systems, 
apparatus and/or computer program products or combinations thereof. 

The foregoing and other objects and aspects of the present invention are 
explained in detail in the specification set forth below. 

10 Brief Description of the Figures 

Figure 1 is a graph showing the chemical shift spectra of a representative 
sample of lipoprotein constituent subclasses. 

Figure 2 is a graph illustrating NMR spectra for a plasma sample and the 
lipoprotein subclass and protein components thereof, with the peaks for methyl groups 
15 being illustrated. 

Figure 3 is a block diagram of operations that can be used to evaluate signal 
data according to embodiments of the present invention. 

Figure 4 is a block diagram of operations that can be used to evaluate signal 
data according to embodiments of the present invention. 
20 Figure 5 is a schematic diagram of an interrogation protocol used to evaluate 

signal data for composite spectra having contributions from overlapping constituents 
according to embodiments of the present invention. 

Figure 6 is a schematic diagram of a data processing system according to 
embodiments of the present invention. 
25 Figure 7 is a schematic diagram of an apparatus for measuring lipoprotein 

concentrations of a blood or plasma sample according to embodiments of the present 
invention. 

Figure 8 is a graph of the distribution of repeated NMR-derived 
measurements of LDL concentration taken on sample SA and sample SB analyzed 
30 according to operations of the present invention compared to conventional processing 
method. 

Figure 9 is a graph of the distribution of a study of the distribution of repeated 
NMR-derived LDL concentration measurements of over five hundred different 
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samples analyzed according to the operations provided by the present invention and 
the conventional processing method. 

Detailed Description of Embodiments of the Invention 

5 The present invention will now be described more fiilly hereinafter, in which 

embodiments of the invention are shown. This invention may, however, be embodied in 
different forms and should not be construed as limited to the embodiments set forth 
herein. Rather, these embodiments are provided so that this disclosure will be thorough 
and complete, and will fully convey the scope of the invention to those skilled in the art. 

10 In the drawings, like numbers refer to like elements throughout, and thickness, size and 
dimensions of some components, lines, or features may be exaggerated for clarity. The 
order of operations and/or steps illustrated in the figures or recited in the claims are not 
intended to be limited to the order presented unless stated otherwise. The broken lines 
in the figures indicate that the feature or step so indicated is optional. 

15 In certain embodiments, the methods, systems, and/or computer products 

provided by the present invention employ statistical fitting models which evaluate 
signal data of an unknown sample according to a predetermined fitting model and 
standards to identify the presence of at least one selected chemical constituent and/or 
to measure the level or concentration thereof in the sample. More typically, the 

20 models, programs, and methods of the present invention are configured to evaluate 
signal data of a composite sample with highly or closely correlated individual 
constituent spectra (having at least a plurality with overlapping signal lines in the 
spectriim) to identify the presence of at least 1 0 different individual constituents 
and/or the level thereof. The term "highly" and "closely" are used interchangeably 

25 when used with "correlated" so that in the description that follows either "highly 
correlated" or "closely correlated" means that a plurality of constituents in a sample 
being analyzed generate respective spectra which can overlap in a composite signal 
that includes spectral contributions from those constituents. 

Although described herein primarily with respect to NMR-derived spectroscopic 

30 signal evaluation of blood and/or plasma samples to determine lipoprotein values (such 

as particle size and/or concentrations), the present invention is not limited thereto. The 

operations of the present invention may be used to evaluate any suitable target sample, 

such as engineered, fabricated or forensic materials undergoing material science, quality 

inspection, forensic or other evaluations, as well as environmental samples, or other 

9 
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types of biosamples. Examples of additional biosamples include, but are not limited to, 
biopsy tissue samples, biofluid samples, stool samples, and the like. In addition, the 
present invention is not intended to be limited to NMR spectroscopy signals, as other 
analysis techniques are contemplated by the present invention. It is contemplated that 
5 operations of the present invention can find application in other spectroscopic analysis, 
chromatography, SEM (scanning electron microscope) evaluation, optical signal 
evaluation, gel electrophoresis, and any other analysis techniques that can benefit from 
deconvolving the spectral contribution of several chemical, optical (fluorescence, 
radiation, transmittance, reflectance and the like), or other measurable constituents 

10 having overlapping signals or spectral contribution in a composite signal. 

As will be appreciated by one of skill in the art, the present invention may be 
embodied as an apparatus, a method, data or signal processing system, or computer 
program product. Accordingly, the present invention may take the form of an entirely 
software embodiment, or an embodiment combining software and hardware aspects. 

1 5 Furthermore, certain embodiments of the present invention may take the form of a 
computer program product on a computer-usable storage medixmi having computer- 
usable program code means embodied in the medium. Any suitable computer readable 
medixmi may be utilized including hard disks, CD-ROMs, optical storage devices, or 
magnetic storage devices. 

20 The computer-usable or computer-readable medium may be, but is not limited 

to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor 
system, apparatus, device, or propagation medium. More specific examples (a 
nonexhaustive list) of the computer-readable medium would include the following: 
an electrical connection having one or more wires, a portable computer diskette, a 

25 random access memory (RAM), a read-only memory (ROM), an erasable 

programmable read-only memory (EPROM or Flash memory), an optical fiber, and a 
portable compact disc read-only memory (CD-ROM). Note that the computer-usable 
or computer-readable medium could even be paper or another suitable medium upon 
which the program is printed, as the program can be electronically captured, via, for 

30 instance, optical scanning of the paper or other medium, then compiled, interpreted or 
otherwise processed in a suitable maimer if necessary, and then stored in a computer 
memory. 

Computer program code for carrying out operations of the present invention 

may be written in an object oriented programming language such as Java®, Smalltalk, 

10 
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Python, Labview, C-h-, or VisualBasic. However, the computer program code for 
carrying out operations of the present invention may also be written in conventional 
procedural programming languages, such as the "C" programming language or even 
assembly language. The program code may execute entirely on the user's computer, 
5 partly on the user's computer, as a stsmd-alone software package, partly on the user's 
computer and partly on a remote computer or entirely on the remote computer. In the 
latter scenario, the remote computer may be connected to the user's computer through 
a local area network (LAN) or a wide area network (WAN), or the connection may be 
made to an external computer (for example, through the Intemet using an Intemet 

1 0 Service Provider) . 

The flowcharts and block diagrams of certain of the figures herein illustrate 
the architecture, functionality, and operation of possible implementations of analysis 
models and evaluation systems and/or programs according to the present invention. 
In this regard, each block in the flow charts or block diagrams represents a module, 

15 segment, operation, or portion of code, which comprises one or more executable 
instructions for implementing the specified logical function(s). It should also be 
noted that in some alternative implementations, the fimctions noted in the blocks may 
occur out of the order noted in the figures. For example, two blocks shown in 
succession may in fact be executed substantially concurrently or the blocks may 

20 sometimes be executed in the reverse order, depending upon the functionality 
involved. 

A. General Description of NMR Spectroscopy of Lipoproteins in Blood and 
Plasma 

25 *H NMR spectra of human blood plasma contain two prominent peaks centered at 

approximately 1.2 and 0.8 ppm (relative to the chemical shift standard, TSP). These 
peaks arise firom methylene (CH2) and methyl (CH3) protons, respectively, of plasma 
lipids. Each of these peaks is very heterogeneous in nature, consisting of overlapping 
resonances fi-om protons of the several chemically distinct classes of lipids present in 

30 plasma: triglycerides; cholesterol; cholesterol esters; and phospholipids. These lipids 

are packaged together into three major classes of lipoprotein particles, which differ in 

the proportions of lipids which they contain. These lipoprotein particles also differ in 

density fi-om which their names are derived: very low density lipoprotein (VLDL), 

low density lipoprotein (LDL), and high density lipoprotein (HDL). 

11 



Attorney Docket No. 9062-27 

These major classes of lipoprotein constituents may be further subdivided into 
subclasses. A subclass of lipoprotein particles comprises constituents that have 
common physical properties, such as density, which permit a subclass to be 
fractionated from other subclasses and that exhibits NMR properties that are distinct 
5 from other subclasses. The NMR properties of one subclass may be distinct in a 
number of ways, such as chemical shift or lineshape variations, which make the 
subclass distinguishable from other subclasses. Subclasses distinguished upon density 
may be considered as a subclass of the class of lipoprotein that contains particles of 
the subclasses density. Delination of lipoprotein subclasses is discussed in U.S. 

10 Patent No. 5,343,389 to Otvos, the disclosure of which was incorporated above in its 
entirety as if recited herein. 

Lipoprotein subclass information is not included in conventional commercially 
prepared lipid panels. The conventional panels typically only provide information 
concemmg total cholesterol, triglycerides, low-density lipoprotein (LDL) cholesterol 

15 (generally a calculated value), and high-density lipoprotein (HDL) cholesterol. In 

contrast, the NMR lipoprotein subclass analysis can provide information about (a) the 
concentrations of a plurality of individual or groupings of similar subclasses of: 
selected ones or groupings of at least about six very low density lipoprotein (VLDL) 
(V1-V6), selected ones or groupings of at least about four subclasses of LDL 

20 (including intermediate-density IDL) (L1-L4), and selected ones or groupings of at 

least about five subclasses of HDL (H1-H5), (b) average LDL particle size (which can 
be used to categorize individuals into LDL subclass pattern-determined risk such as 
Pattern A, AB, or B), and (c) LDL particle concentration (elevated concentrations 
being associated with increased cardiovascular risk). The phrase "selected ones or 

25 groupings" of subclass constituents means that each individual subclass constituent 
can be measured or that combined related subclasses may be used to provide the 
constituent subclass measurement. For example, VI may include the constituents of 
V2 and/or V3, and V6 may be measured alone or may be measured as inclusive of V4 
and/or V5. In other embodiments, VI, V2 and V3 could be independently assessed 

30 and separately reported. Similar individual assessments or groupings may be carried 

out for other related subclasses groupings of LDL (that may include IDL) and HDL. 

See U.S. Patent Application Serial No. 09/992,068, entitled Methods, Systems, and 

Computer Program Products for Analyzing and Presenting NMR Lipoprotein-Based 

Risk Assessment Results, for further description of risks associated with measured 
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values of subclasses, the contents of which are hereby incorporated by reference as if 
recited in full herein. 

Generally described, only the fraction of the lipids in these lipoprotein particles 
that are in a fluid, mobile state (as opposed to an ordered liquid-crystalline state) 
5 contribute to the plasma lipid NMR resonances. The heterogeneity of these plasma 
signeils is reflected by their complex lineshapes, which vary from person to person 
owing to variations of the plasma concentrations of the different lipoprotein particles, 
each of which has its own characteristically different NMR spectral properties. 
NMR spectroscopy can be employed to determine the concentrations of 

1 0 lipoprotein classes (VLDL, LDL, HDL, and chylomicrons) and lipoprotein subclasses 
of a blood and/or plasma sample, as well as a protein constituent, by a computer 
analysis of the lineshapes of its methyl and methylene signals (use of the methyl 
signal alone has been found to be preferable). This region of the observed plasma 
spectrum can be accurately represented by a simple linear combination of the spectra 

15 of the major lipoprotein and protein classes and/or subclasses noted above into which 
plasma can be fractionated by differential flotation ultracentrifugation. 

The NMR spectral properties of these classes have been found to be quite 
similar from person to person. Thus, differences among the NMR signals that form 
the plasma of individuals are caused by differences in the amplitudes of the lipid 

20 resonances for these constituents, which in turn are proportional to their 
concentrations in the plasma. 

The small person-to-person variations in the lineshapes of the lipoprotein 
classes are caused by the subclass heterogeneity known to exist within each of these 
lipoprotein classes. Figure 1 shows the lineshapes and chemical shifts (positions) for 

25 a number of subclasses of lipoproteins. As shown in Figure 1, the chemical shifts 
and lineshape differences between the subclasses are much smaller than those 
between the major lipoprotein classes, but are completely reproducible. Thus, 
differences among the NMR signals from the plasma of individuals are caused by 
differences in the amplitudes of the lipid resonances from the subclasses present in the 

30 plasma, which in tum are proportional to their concentrations in the plasma. This is 

illustrated in Figure 2, in which the NMR chemical shift spectra of a blood plasma 

sample is shown. The spectral peak produced by methyl (CH3) protons 60 (shown as a 

solid line) is shown for the blood sample in Figures 2. The spectral peak 61 (shown 

as a dotted line) in Figure 2 is produced by the arithmetic sum of the NMR signals 
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produced by the lipoprotein subclasses of the major classes VLDL, LDL, HDL, 
proteins and chylomicrons, as illustratively shovm by certain of the subclasses in 
Figure 1. It can be seen that the lineshape of the whole plasma spectrum is dependent 
on the relative amoxmts of the lipoprotein subclasses whose amplitudes change 
5 (sometimes dramatically) with their relative concentrations in the plasma sample. 

Since the observed CH3 lineshapes of whole plasma samples are closely 
simulated by the appropriately weighted sum of lipid signals of its constituent 
lipoprotein classes, it is possible to extract the concentrations of these constituents 
present in any sample. This is accomplished by calculating the weighting factors 

10 which give the best fit between observed blood plasma NMR spectra and the 

calculated blood plasma spectra. Generally speaking, the process of NMR lipoprotein 
analysis can be carried out by the following steps: (1) acquisition of an NMR 
"reference" spectrum for each of the "pure" individual or related groupings of 
constituent lipoprotein classes and/or subclasses of plasma of interest, (2) acquisition 

15 of a whole plasma NMR spectrum for a sample using measurement conditions 

substantially identical to those used to obtain the reference spectra, and (3) computer 
deconvolution of the plasma NMR spectrum in terms of the constituent classes and/or 
subclasses (or related groupings thereof) to give the concentration of each lipoprotein 
constituent expressed as a multiple of the concentration of the corresponding 

20 lipoprotein reference. 

In the past, the plasma lineshape analysis was accomplished by calculating 
weighting coefficients for each of the reference NMR spectra which minimize the 
sum of squared deviations and iteratively removing those constituents having negative 
values between the observed plasma NMR spectrum and that which is calculated by 

25 sunmiing the weighted reference spectra. Typically, a correlation coefficient 

(calculated by the method described below) between the measured spectrum and the 
calculated lineshape of at least 0.999 was used to indicate a successful deconvolution 
of the spectrum. 

Although, as inferred above, the procedure can be carried out on lipoprotein 

30 classes, carrying out the process for subclasses of lipoproteins can decrease the error 

between the calculated lineshape and the NMR lineshape, thus increasing the 

accuracy of the measurement while edlowing for simultaneous determination of the 

subclass profile of each class. Because the differences in subclass lineshapes and 

chemical shifts are small, it is typically important to correctly align the reference 
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spectrum of each subclass with the plasma spectrum. The aligmnent of these spectra 
is accomplished by the aligmnent of control peaks in the spectra, which are known to 
respond in the same manner to environmental variables, such as temperature and 
sample composition, as do the lipoprotein spectra. One such suitable aligrmient peak 
5 is the peak produced by CaEDTA, although other EDTA peaks or suitable peak may 
be utilized. By alignment of the spectra, the small variations in the subclasses' 
lineshapes and chemical shifts may be exploited to produce higher accuracy and 
subclass profiles. 

Further description of these methods can be found in U.S. Patent Nos. 
10 4,933,844 and 5,343,389 to Otvos. 

B. Lineshape 

The mathematics used in the lineshape fitting process (i.e. . least squares fit of 
an unknown function in terms of a weighted simi of known functions) is well known 
15 and is described in many textbooks of numerical analysis, such as F.B. Hildebrand, 
Introduction to Numerical Analysis^ 2nd edition, pp. 314-326, 539-567, McGraw-Hill, 
1975. 

In particular embodiments, reference samples of each constituent lipoprotein 
and protein component to be analyzed are prepared (typically they are refHgerated 

20 during storage and allowed to warm prior to analysis) and placed within the 

spectrometer 10 (Figure 7). An NMR measurement is then taken on each reference 
sample to define a standard for the respective constituent. The data for the reference 
samples (for a plurality of different constituents) is processed and stored in a 
processor 310 (Figure 6) and/or computer (such as shown for featxire 11 in Figure 7). 

25 Techniques for acquiring and storing NMR spectroscopic data are well-known to 
those skilled in this art and need not be described in fiirther detail. The reference 
samples or standards may be established a priori and used to measure a plurality of 
different patient specimens or samples over time. 

To carry out the analysis, the data points of the real part of the sample plasma 

30 spectrum that comprise the spectral region to be fit (normally 0.73-0.85 ppm for 

lipoprotein evaluations) are entered into an array. This plasma array consists of m 

discrete data points denoted Pi^^ /=1 ,2, . . . /w. The data points of the real part of the 

Hpoprotein subspecies reference spectra for the same spectral region are entered into 

separate arrays. The data points of these arrays are denoted Vji , where /=1,2, . . .m 
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data points and 7=1,2, . . . « constituents). It is noted that in the Equations and text 
describing same that follows, some symbols may be bolded and/or italicized at certain 
locations but not at other locations, however this is not meant to alter the correlation 
or change the meaning of the symbol herein. 

The method for fitting the measured sample plasma spectrum, P,^, with a 
Hnear combination of n constituent spectra is based on the premise that there are a set 
of coefficients (weighting factors), Cj , corresponding to the contributions of 
component j (lipoprotein subclass components and protein component), and a 
coefficient, c/ , corresponding to the imaginary portion of the sample plasma 
spectrum, such that for each data point, P^^ » Pl^ , where 



n 

S C.V.. 
J J' 



+ c'V' 

p ' (calculated plasma spectrum) 



(1) 



In the past, the best fit was achieved when the root mean square error. 



(2) 



15 was minimized, where ej =P/^ - Pl^. This was accomplished by finding those 
coefficients which minimize Sei^, that is, when 



dc 



^ = 0, 



(3) 



20 j=l,2, . . . n+1 (n -1 subspecies components plus protein and plasma spectrum phase 
contributions). Differentiation results in «+7 simultaneous linear equations: 



m n+l f M 



2 V^Vj, 



k = h 2,...n+l 



(4) 
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If 



m m 
= Z V^Vj, and s,= E />» 
J=l i=l 



(5) 



then there are n+7 simultaneous linear equations of the form: 



m 



S CjGj^j =s^ k = l, 2, + 1 
7 = 1 



(6) 



10 



Forming the «+7 x n+I matrix, [A]=[aicj ], j=l,2 . 
[A]C = S, where C and S are the column vectors. 



«+7; k=l,2 . . . «+7, gives 



'^1 




'^1 


^2 


and 

















(7) 



The coefficients providing the best fit were calculated by decomposition of the matrix 
[A] into a new set of m X »i matrices known collectively as the "singular value 
1 5 decomposition" of [A] : 



[A] = [m[W][V]' 



(8) 



20 where [U] is a matrix of orthogonal colunm vectors (scalar products = 0), [V]^ is the 
transpose of an orthogonal matrix [V], and [W] is a diagonal matrix with positive or 
zero elements, cdled "singular values:" 



25 



[W] = 



m 0 

0 W2 

0 0 
17 
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0 

Wm 
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From this. 



[A]-' = [V][Wr'[U] 



(10) 



10 



where 



l/wi 0 ••• 0 
0 \IW2 ■■■ 0 



0 



1/ Will 



(11) 



15 
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which allows C to be solved for: 

C= [V][W]-*[U]^S (12) 



where C was the best possible solution vector, provided that values of wj below a 
certain threshold value (selected by the user) are ignored {llwj set to zero). These 
25 singular values can give rise to "ill-conditioned'' linear combinations of near 

degenerate solutions, being most corrupted by roundoff errors. The actual solution of 
C was obtained by "back-substitution" in which is determined, allowing for the 
solution of Wm-i, etc. 

The root mean square deviation (RMSD) is computed as 

30 



^ RMS = 



m 



I = 1 



E {P?-P^)^ (13) 



The correlation coefficient was computed as 
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m 



i = \ 



r'' = 



(14) 



m 



m 



i = \ 



i = \ 



In the past, the component coefficients resulting from this lineshape analysis provided 
5 the concentrations of the lipoprotein and protein constituents in each plasma sample. 
Each concentration can be expressed relative to the concentration of the lipoprotein 
whose spectrum is used as the reference. In operation, the final concentrations may 
be normalized to the integrated area of the resonance from a tri-methylacetate external 
standard sample run on the same day to correct for variations in, the detection 

1 0 sensitivity of the NMR spectrometer. 

As described above, the least squares method used in the past for NMR- 
derived measurement of lipoprotein subclasses required that the derived 
concentrations be a positive value. Generally described, in the past, when a negative 
coefficient for a selected constituent associated with one of the standards was 

1 5 encoimtered it was constrained to zero, and the calculation was performed again, 

subject to that constraint. The latter constraint can be desirable when fitting plasma 
samples that may not contain one or more of the components included in the fit model 
or because experimental errors in the data (noise) can cause the calculation to give 
negative values for concentrations for these components. 

20 Figure 3 illustrates a flow chart of operations with reference to certain of the 

above-stated equations in blocks 100-160. In operation, spectra of subspecies 
components is read into Array V (block 100). The real part of the sample plasma 
spectrum is read into Array (block 110). The imaginary part of the sample plasma 
spectrum is read into the Array V (block 120). Marix [A] and S vector are calculated 

25 (block 130) using Equation 5. Matrix [A] is decomposed into a singular value 

decomposition (block 140) such as by using Equation 8. The singular values are 
selected based on a predetermined acceptance frmction (block 145). The coefficient 
vector C is calculated using back substitution (block 150). The negative values in C 
are sequentially set to zero and the curve is refit, until there are no negatives left. The 
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yes or no inquiry at (block 151) asks whether there are negatives left and, if so, 
directs the program to return to the operation in (block 130) and if not, directs the 
operations to advance to (block 155). C is mxiltiplied by normalization constants to 
obtain concentrations (block 155). The root mean square deviation and correlation 
5 coefficient are calculated (block 160) such as by using Equations 13 and 14. 

Embodiments of the present invention modify and improve on the 
conventional protocol by employing operations that can reduce measurement 
variability in individual constituents and/or by reducing the number of constituents of 
interest that are reported as having a "0" value. The variability can be assessed by 

10 repeatedly analyzing a given sample and measuring the individual constituents. The 
individual constituents measured by the present invention will t>pically be clustered 
more tightly together relative to the individual constituents measured by the 
conventional protocol. The methods and systems of the present invention can reduce 
the variability by at least about 50% relative to the prior method for the same sample. 

1 5 Further, when analyzing the same sample in repeated interrogations, the measured 

values of at least a majority of the constituents of interest, if not all of the constituents 
of interest, can be reproducible, typically within about +/- 2.34% (median CV). See 
Table 2 in the Example Section. 

Referring now to Figure 4, operations of certain embodiments of the invention 

20 are illustrated. It is noted that the term "matrix," as used herein, can, in certain 

embodiments, be a vector, as a vector is a special form of a matrix {i,e., a vector is a 
matrix with n rows and 1 column, or 1 row and k columns). As shown in Figure 4, 
the operations can include generating a mathematical design matrix of constituent 
data comprising a plurality of mathematical constituent matrix data sets, each 

25 constituent data set including amplitude values of a respective spectrum lineshape of a 
selected independent constituent parameter over desired data points generated by a 
predetermined analysis method (of a known reference sample) (block 200). The 
selected constituent parameter (the independent parameter) can be wavelength, 
voltage, current, speed, force, torque, pressure, movement, energy, chemical shift 

30 (ppm), temperature, frequency. Exemplary dependent parameters of interest may 

include, but are not limited to, intensity, opacity, transmittance, reflectance, 

fluorescence, vibration, or other desired parameter. The constituent data of the design 

matrix ("JT') can be reference or standard data established a priori from separate 

individual analysis of discrete constituents of interest and/or stored in an accessible 
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database to be used as a standard and applied in analysis of all or selected ones of 
unknown samples. 

A composite mathematical matrix can be generated comprising a data set of 
amplitude values of a composite spectrum lineshape over the desired data points for 
5 an unknown sample that is generated by a predetermined analysis method. The 

composite lineshape comprises spectral contributions from a plurality of the selected 
individual constituents (block 205). The design matrix can be rotated to yield a 
rotated design matrix of principal components (which may, in certain embodiments, 
be mathematically represented by matrix "Z" as will be discussed further below) and 

10 processed to selectively exclude data for certain principal components to generate a 
reduced design matrix (which may, in certain embodiments, be represented 
mathematically by matrix "^*" as will be discussed further below) (block 220). The 
term "principal components" means individual identifiable constituents (and may 
include both relevant and non-relevant constituents) in the rotated space. In 

1 5 operation, in certain embodiments, the operations can include mathematically rotating 
the design matrix, interrogating the rotated design matrix (using an acceptance 
function) to find those rotated principal components with contributions that benefit 
the deconvolution, and rotating back those accepted principal components to form the 
reduced design matrix. 

20 In certain embodiments, a normal equations matrix (which, in certain 

embodiments, may be mathematically represented by matrix "X^') can be computed 
from the design matrix (block 225). The normal equations matrix can be interrogated 
by applying a predetermined acceptance fionction ("A (A) ") to the principal 
components to generate the reduced design matrix. The acceptance function can be a 

25 forced logic function of "0" and "1" (representative of rejected (excluded) values and 
accepted (included) values, respectively) or may be a relative or absolute fimction that 
discards the principal components having values low with respect to other 
components or relative to a predefined threshold (i.e., the values having the least 
significance) and retaining the more significant values in the reduced design matrix. 

30 The reduced matrix may be generated by rotating the design matrix and eliminating 

the colunm or columns in the rotated design matrix with the most "0"s as determined 

by the acceptance fimction. 

Regression fit weighting coefficients can be computed based on accepted 

principal components of the rotated design matrix in the reduced design matrix to 
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determine the presence of and/or measvirement of the selected or target constituents in 
the unknown sample undergoing analysis (block 230). In particular embodiments, the 
weighting coefficients may be determined according to Equation (20) as will be 
discussed further below. A sequential least squares regression analysis can then be 
5 employed to restrict or restrain negative coefficients to zero until all (or substantially 
all) constituents of interest are non-negative (block 231). In certain embodiments, 
before the sequential regression analysis evaluation is performed, the reduced design 
matrix is combined with the composite matrix to define a first set of weighting 
factors. 

10 Described differently, the signal from the unknown test sample can be 

projected onto the space sparmed by selected principal components and the projection 
coefficients can be transformed back into the original space to provide a reduced 
design matrix for arriving at weighting coefficients. As such, the design matrix can 
be mapped into the rotated design matrix and the components selected to yield the 

1 5 reduced design matrix. 

The reduced design matrix can be generated based on predetermined criteria 
using a shrinkage estimator. In certain embodiments, the shrinkage estimator can be 
based on the spectral decomposition of a matrix defined by the multiplication of the 
constituent matrix with the transposed constituent matrix. In certain embodiments, 

20 the shrinkage estimator can be found by projecting the constituent matrix onto the 
space spanned by the accepted basis set determined from the rotation of the design 
matrix, and shrinking the projection of the constituent matrix on the orthogonal 
subspace to zero. A particularly suitable shrinkage estimator is described in Equation 
(20). 

25 It is noted that other shrinkage estimators may also be employed. Generally 

stated, a shrinkage estimator of a parameter b is any estimator B{X) of the data X such 
that ||£{5(J0}|| < 11*11- A simple example would be to take an unbiased estimator of ft, 
say U{X)y and multiply by a constant smaller than 1 : B{X) = pU{X) where 0 <^ < 1 . 
Because U{X) is unbiased, by definition of unbiased, E{U{X)} = 6. Then the norm of 

30 the expectation could be expressed as ||£{5(A)}|| = \\E{p U{X)}\\ = p \\E{U{X)}\\ = p 
\\b\\ < \\b\\ since p<\. In the shrinkage estimator of Equation (20), shrinkage is 
carried out selectively, in the direction of zero for some components, and not for 
others. 
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The number of individual constituent data sets can be at least ten (10), each 
representing a respective one of at least ten (10) different closely correlated chemical 
constituents, some of the constituents having overlapping signal lines in a region of 
the spectrum analyzed (block 202). The number of columns in the design constituent 
5 matrix can correspond to the number of different individual constituents of interest, 
and, where needed, at least one additional column, which may be a matrix of 
variables, representing spectra contributions from at least one non-relevant variable 
constituent and/or noise (block 201). In operation, this additional colimm may not be 
used {i.e,, "0")« The at least one non-relevant variable can be a constituent known to 
10 be in the sample but not a target interest and/or background or environmental noise, 
and the like. 

In certain embodiments, the predetermined analysis method is NMR 
spectroscopy, and the composite signal represents intensity over a desired interval or 
region in a chemical shift spectrum (typically represented in ppm) such that intensity 

15 is the dependent variable parameter (block 212). 

Figure 5 is a schematic illustration of certain embodiments of the 
deconvolution operations used to evaluate closely correlated signal data. As shown, a 
design matrix "Jt^' of constituent data comprising a plurality of individual 
mathematical data sets, each constituent data' set including amplitude v£ilues of a 

20 respective spectrum lineshape of a selected constituent parameter over the variable 
space, spectrum length, or data points of interest, is obtained. The coordinate system 
of the design matrix is rotated to generate a rotated design matrix "Z" and, ultimately, 
a reduced design matrix "X*" (and a related transposed matrix "^**"). The line 
extending between X* and Jf** represents a classifier or acceptance function that 

25 determines what principal component data in X will be excluded from The 
matrix is then rotated back to the original coordinate system, thereby generating a 
reduced design matrix "JT*" with data from modified by the analysis performed at 
the rotation of the coordinate system. The matrix of the composite spectrum 
lineshape data "F' is projected onto X* and the weighting coefficients "b" calculated. 

30 A sequential least squares ("SLS") regression analysis is performed on the defined 

weighting coefficients to ensure that positive weighting coefficients are established. 

The operations may be iteratively repeated. 

Figure 6 is a block diagram of exemplary embodiments of data processing 

systems that illustrates systems, methods, and computer program products in 
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accordance with embodiments of the present invention. The processor 310 
commimicates with the memory 314 via an address/data bus 348. The processor 310 
can be any commercially available or custom microprocessor. The memory 314 is 
representative of the overall hierarchy of memory devices containing the software and 
5 data used to implement the functionality of the data processing system 305. The 
memory 314 can include, but is not limited to, the following types of devices: cache, 
ROM, PROM, EPROM, EEPROM, flash memory, SRAM, and DRAM. 

As shown in Figure 6, the memory 314 may include several categories of 
software and data used in the data processing system 305: the operating system 352; 

10 the application programs 354; the input/output (I/O) device drivers 358; a 

deconvolution module 350; and the data 356. The deconvolution module 350 can 
include a fitting module with computer program code that rotates a design matrix of a 
priori constituent data and generates a reduced design matrix of selected principal 
components using an acceptance ftmction (rotated back) to yield non-negative 

1 5 weighting factors for target constituents of interest in an unknown sample undergoing 
analysis. 

In certain embodiments, the deconvolution fitting module can employ a 
shrinkage estimator. The acceptance function can use Eigen values and a classifier. 
The composite data of the unknown sample can be presented as a matrix that is 

20 projected onto the reduced design matrix to yield the weighting coefficients of the 
target constituents. The data 356 may include signal (constituent and/or composite 
spectrum lineshape) data 362 which may be obtained fi-om a data or signal acquisition 
system 320. As will be appreciated by those of skill in the art, the operating system 
352 may be any operating system suitable for use with a data processing system, such 

25 as OS/2, AIX or OS/390 from International Business Machines Corporation, Armonk, 
NY, WindowsCE, WindowsNT, Windows95, Windows98, Windows2000 or 
WindowsXP from Microsoft Corporation, Redmond, WA, PalmOS from Palm, Inc., 
MacOS from Apple Computer, UNIX, FreeBSD, or Linux, proprietary operating 
systems or dedicated operating systems, for example, for embedded data processing 

30 systems. 

The I/O device drivers 358 typically include software routines accessed 

through the operating system 352 by the application programs 354 to communicate 

with devices such as I/O data port(s), data storage 356 and certain memory 314 

components and/or the image acquisition system 320. The application programs 354 
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are illustrative of the programs that implement the various features of the data 
processing system 305 and preferably include at least one application which supports 
operations according to embodiments of the present invention. Finally, the data 356 
represents the static and dynamic data used by the application programs 354, the 
5 operating system 352, the I/O device drivers 358, and other software programs that 
may reside in the memory 314. 

While the present invention is illustrated, for example, with reference to the 
deconvolution module 350 being an application program in Figure 6, as will be 
appreciated by those of skill in the art, other configurations may also be utilized while 

10 still benefiting from the teachings of the present invention. For example, the 

deconvolution module 350 may also be incorporated into the operating system 352, 
the I/O device drivers 358 or other such logical division of the data processing system 
305. Thus, the present invention should not be construed as limited to the 
configuration of Figure 6, which is intended to encompass any configuration capable 

15 of canying out the operations described herein. 

In certain embodiments, the deconvolution module 350 includes computer 
program code for generating a shrinkage estimator and defining an optimum 
weighting factor for a plurality of different selected constituents in complex samples 
having a plurality of closely correlated constituents that contribute to a composite 

20 signal. The computer program code can include a sequential least squares regression 
analysis based on a statistical model comprising: (a) a mathematical composite matrix 
representing spectrum measurements of the amplitude of a composite signal of an 
unknown sample across "n" points in the spectrum; and (b) a design matrix including 
respective mathematical matrices for the amplitude of each of a plurality of individual 

25 selected constituents across "n" points in the spectrum. The shrinkage estimator and 
acceptance function can be used to generate optimum weighting factors "bopt" for each 
constituent of interest based on the difference between the composite signal amplitude 
and the constituent amplitudes defined by interrogation of the values in the constituent 
and composite vectors. The analysis can be iteratively repeated in a sequential least 

30 squares regression model until target or selected constituents have been assigned non- 
negative weighting factors such that a sequential least squares statistical evaluation 
produces a satisfactory non-negative solution set for the target constituents. 

The I/O data port can be used to transfer information between the data 

processing system 305 and the image scanner or acquisition system 320 or another 
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computer system or a network (e.g., the Internet) or to other devices controlled by the 
processor. These components may be conventional components such as those used in 
many conventional data processing systems, which may be configured in accordance 
with the present invention to operate as described herein. 
5 While the present invention is illustrated, for example, with reference to 

particular divisions of programs, functions and memories, the present invention 
should not be construed as limited to such logical divisions. Thus, the present 
invention should not be construed as limited to the configuration of Figure 6 but is 
intended to encompass any configuration capable of carrying out the operations 

1 0 described herein. 

More particularly described, in particular embodiments, a target sample to be 
analyzed may have a number of different selected parts or constituents or individual 
or groupings of selected constituents. The number of constituent parts may be noted 
as k. Thus, a sample undergoing analysis can include constituent parts, Py, . . Pk^ As 

1 5 noted above, the number k may be at least 1 0, and can be between 35-40 or even 

larger . The sample can be analyzed on a desired suitable analytical instrument, with 
the amplitude of the independent variable {e.g., intensity, wavelength, retention time, 
current, etc., as described above) varied. The amplitude or value of the independent 
constituent(s) varies corresponding to the detector response of the analj^ical 

20 instrument and the variation can be recorded in the form of a spectnmi. The spectrum 
or lineshape consists of amplitude measurements (that may be intensity measurements 
in certain embodiments) at n points. These amplitude measurements of the sample 
being analyzed are stored in a composite matrix, Y. 

Also, each constituent part, Pj ,7 = 1 to is separately analyzed to define a 

25 standard or reference over the same independent variable space, region, or data points 
as the sample undergoing analysis. Each set of the respective reference constituent 
spectral amplitudes (such as intensities) are stored in a matrix Xj, where,/ =1 , . . ^, 
also of length n. Thus, a design constituent matrix can be represented by: 

30 X = [X,,X,,-^X,,Z] (14) 

where Z is a matrix of amplitude data regarding at least one additional variable that 
can be deconvolved from the spectral signal. For example, Zmay contain data 
representing spectral intensities of other known or unknown constituents, the 

26 



Attorney Docket No. 9062-27 



10 



imaginary part of the spectrum of the analyte sample (where Y contains the real part 
of that spectrum), noise, etc. . . . However, it is noted that Z can be a matrix, a vector, 
or, in certain embodiments, even null (a degenerate form of matrix with 0 columns). 

In certain embodiments, Z is a matrix of size nxw, where w > 0. In certain 
particular embodiments, w=l. The estimated contributions of the individual 
components to the sample or analyte composite spectrum can be fovmd by 
determining a normalized or optimal coefficient weightings bopt given by equation 15. 
The normalized weighting coefficient minimizes the values inside the brackets of the 
arg minb function. 

6opt = arg min^{||7-^|| : 6 > 0}. (15) 



These normalized weightings can be found by solving equation (15) using a shrinkage 
estimator to the regression problem, followed by the application of non-negative least 
squares to ensure that the non-negativity constraint is satisfied. The cycle is repeated 
1 5 until the least squares solution provides only non-negative weighting factors. 

The shrinkage estimator can be based on the spectral decomposition of the matrix M= 

X^X where Jf^ represents a transposition of the constituent matrix X. 

Further, the spectral decomposition matrix M may be expressed by the following: 

20 M^QKQ^ (16) 

where Q (A:+>v) x {h\-w) is orthogonal, and A (^+w) x (^+vv) is a diagonal matrix 
comprising eigenvalues. The eigenvalue matrix A is sorted with the largest 
eigenvalue in the (1,1) element or position, the next largest value in the (2,2) element 

25 or position, and continuing left to right and top to bottom, etc, until the smallest 

element is placed in the (w, «) element or position. An adjustable tolerance parameter 
"t" can be defined such that x >0. Also an acceptance or classifier fimction can 
be defined such that ^(A.): 9? — > {0, 1 } which indicates which component is accepted 
into a fitting model. 

30 A reduced eigenvalue matrix ''AreJ^ can be defined as: 

A^,^Admg{A{Ajj)) (17) 

The matrix ("reduced design matrix") described above may be identified as: 
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X*=QAred"^ (18) 



10 



One acceptance function that has been used is: 



1 ifA>TA,, (19) 
0 otherwise 



where x has been chosen to minimize Var b while maintaining E{6}. Examples of 
values for x are in the range between 10"^ and 4x10^^ for cases where k is about 37, 
/.e., where there are about 37 constituents or parts "Pi-Psy". Other values may be 
appropriate for lesser or greater numbers of constituents. Then b can be calculated as: 

b = QKed'Q'^^Y (20) 



15 C. Configuration of Exemplary System for Acquiring and Calculating 
Lineshane 

Referring now to Figure 7, a system 7 for acquiring and calculating the 
lineshape of a selected sample is illustrated. The system 7 includes an NMR 
spectrometer 10 for taking NMR measurements of a seimple. In one embodiment, the 

20 spectrometer 10 is configured so that the NMR measurements are conducted at 400 
MHz for proton signals; in other embodiments the measurements may be carried out 
at 360MHz or other desired frequency. Other frequencies corresponding to a desired 
operational field strength may also be employed. Typically, a proton flow probe is 
installed, as is a temperature controller to maintain the sample temperature at 47 +/- 

25 0.2 degrees C. Field homogeneity of the spectrometer 10 can be optimized by 

shimming on a sample of 99.8% D2O until the spectral linewidth of the HDO NMR 
signal is less than 0.6 Hz. The 90** RF excitation pulse width used for the D2O 
measvirement is typically ca. 6-7 microseconds. 

Referring again to Figure 7, the spectrometer 10 is controlled by a digital 

30 computer 11 or other signal processing imit. The computer 11 should be capable of 
performing rapid Fourier transformations and may include for this purpose a hard- 
wired sine table and hardwired multiply and divide circuit. It may also include a data 
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link 12 to an external personal computer 13, and a direct-memory-access channel 14 
which connects to a hard disc unit 15. 

The digital computer 11 may also include a set of analog-to-digital converters, 
digital-to-analog converters and slow device I/O ports which connect through a pulse 
5 control and interface circuit 16 to the operating elements of the spectrometer. These 
elements include an RF transmitter 17 which produces an RF excitation pulse of the 
duration, frequency and magnitude directed by the digital computer 11, and an RF 
power amplifier 18 which amplifies the pulse and couples it to the RF transmit coil 19 
that surrounds sample cell 20. The NMR signal produced by the excited sample in the 

10 presence of a 9.4 Tesla polarizing magnetic field produced by superconducting 

magnet 21 is received by a coil 22 and applied to an RF receiver 23. The amplified 
and filtered NMR signal is demodulated at 24 and the resulting quadrature signals are 
applied to the interface circuit 16 where they are digitized and input through the 
digital computer 11 to a file in the disc storage 15. The deconvolving module 350 

15 (Figure 6) can be located in the digital computer 11 and/or in a secondary computer 
that may be on-site or remote. 

After the NMR data are acquired firom the sample in the measurement cell 20, 
processing by the computer 11 produces another file that can, as desired, be stored in 
the disc storage 15. This second file is a digital representation of the chemical shift 

20 spectrum and it is subsequently read out to the computer 13 for storage in its disc 

storage 25. Under the direction of a program stored in its memory, the computer 13 , 
which may be personal, laptop, desktop, or other computer, processes the chemical 
shift spectrum in accordance with the teachings of the present invention to print a 
report, which is output to a printer 26 or electronically stored and relayed to a desired 

25 email address or URL. Those skilled in this art will recognize that other output 

devices, such as a computer display screen, may also be employed for the display of 
results. 

It should be apparent to those skilled in the art that the fimctions performed by 
the computer 13 and its separate disc storage 25 may also be incorporated into the 
30 fimctions performed by the spectrometer's digital computer 11. In such case, the 

printer 26 may be connected directly to the digital computer 11. Other interfaces and 
output devices may also be employed, as are well-known to those skilled in this art. 

The invention will now be described in more detail in the following non- 
limiting examples. 
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EXAMPLE 1 
Acquisition of Sample Lineshape Data 

Sample lineshape data is obtained from the analysis method of interest. One 
5 example is acquiring blood or blood plasma sample NMR lineshape data as described 
above for meastirement of exemplary reference samples. The same field-strength 
NMR spectrometer is used (typically 400 MHz) and it is set up to operate in the 
identical fashion used to acquire the lipoprotein reference spectra. Other frequencies 
and magnetic fields may be employed. The time domain spectrum (FID) of the 

10 plasma sample is acquired in the identical fashion as the reference spectra, except that 
2-transient spectra of up to 5 multiplicates are acquired rather than a single 32- 
transient or higher spectrum. Processing is carried out in an identical manner to 
produce a digitized representation of the blood plasma sample spectrum in the disk of 
the computer. The whole plasma spectrum is then accurately referenced to the sharp 

1 5 NMR resonance peak produced by the calcium complex of EDTA which is present in 
the sample. The sample spectrum and the reference spectra are shifted as needed to 
align the CaEDTA peak at 2.5 19 ppm on the horizontal scale. It is noted that 
operations of the invention can also be carried out for spectral aligrmient processing 
of alternate intemal reference signals. For example, operations of the present 

20 invention can be carried out for intemal references such as, but not limited to, glucose 
or lactate signals that can be used for spectral alignment purposes. Other analysis 
methods as well as other signal acquisition techniques can be employed, depending on 
the application, as will be appreciated by those of skill in the art. 

Precision Study 

25 Two samples, SA and SB, were repeatedly analyzed on several NMR 

instruments. Lipoprotein information was generated for each result for both the 
improved deconvolution method and the conventional or standard deconvolution 
method. Figure 8 illustrates the distribution of replicate measurements for samples 
SA and SB under both algorithms for LDL cholesterol concentrations. The 

30 coefficient of variation ("CV") for each algorithm is shown in Table 1 and illustrates 
reproducibility measures of a given sample when repeatedly interrogated. 
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Table 1. 



Sample 


Algorithm 


N 


Spread 
CV 


CV Ratio 


SA 


Present method 


24 


2.19 






Previous method 


24 


4.82 


0.45 


SB 


Present method 


24 


1.28 






Previous method 


24 


2.41 


0.53 



In Table 1, CV = coefficient of variation expressed as a percentage. CV Ratio is the 
5 ratio in CV between the improved method and the previous or conventional method 
used in the past. 

Randomly Selected Samples 

The spectra from a set of 595 randomly selected plasma samples were 
analyzed using both the improved method described above and the previous method. 
10 The spectra of each sample was measured 5 times, and the variability among these 

five measurements was then characterized. Because this is a much larger selection of 
plasma samples than in the precision study, a wider variety of lipoprotein profiles can 
be encountered. 

The difference in the LDL concentration and the mean LDL concentration for 
15 a given subject were computed, and these differences were then plotted. Because each 
subject's results now have the same mean of zero, all subjects can be plotted on the 
same graph. Figure 9 illustrates the distribution of LDL concentrations (centered at 
zero) for both fitting methods. Figure 9 is a graph of the variability (or 
reproducibility) fi-om repeatedly measured spectra in 595 randomly selected samples. 
20 For each subject, the five measurements were used to compute deviation for that 
subject and algorithm, with smaller deviations being better. 

It is believed that while the previous method produces results that are at least 
as good as other clinical evaluation techniques, the improved method is able to reduce 
variability and improve on the reproducibility of measurements. The within sample 
25 coefficient of variation for the improved method for LDL measurements in the data 
obtained is 2.34% (median) and 6.32% (median) for the prior method. Thus, the new 
method can reduce the variability by over 50%, and as shown in Table 2, can reduce 
the variability by about 63% (median value). See Table 2. 
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Table 2. Distribution of standard deviations within samples, by algorithm. 





Distribution of within sample CV 


Min Ql Median Q3 Max 


Current 
Previous 


0.81% 1.66% 2.34% 3.16% 7.52% 
1.13% 3.93% 6.32% 9.14% 17.66% 



Table 2 provides a measure of reproducibility using the coefficient of variation CV, 
where CV=SD/mean. For the constituent measured in this example, SD is 
proportional to the mean in this application, at least for LDL concentration, the CV 
value is a characterization of variability. 

The foregoing is illustrative of the present invention and is not to be construed 
as limiting thereof. Although a few exemplary embodiments of this invention have 
been described, those skilled in the art will readily appreciate that many modifications 
are possible in the exemplary embodiments without materially departing from the 
novel teachings and advantages of this invention. Accordingly, all such modifications 
are intended to be included within the scope of this invention as defined in the claims. 
In the claims, means-plus-function clauses, where used, are intended to cover the 
structures described herein as performing the recited fimction and not only structural 
equivalents but also equivalent structures. Therefore, it is to be understood that the 
foregoing is illustrative of the present invention and is not to be construed as limited 
to the specific embodiments disclosed, and that modifications to the disclosed 
embodiments, as well as other embodiments, are intended to be included within the 
scope of the appended claims. The invention is defined by the following claims, with 
equivalents of the claims to be included therein. 
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